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1.  Introdnction 


Constrained  parameter  problems  arise  in  a  wide  variety  of  applications,  including 
bioassay,  actuarial  graduation,  ordinal  categorical  data,  response  surfaces,  reliability 
development  testing  and  variance  component  models.  Truncated  data  problems  —  to  be 
understood  as  encompassing  both  censoring  and  scoring  or  grouping  mechanisms  —  arise 
naturally  in  survival  and  failure  time  studies,  ordinal  data  models  and  categorical  data 
studies  aimed  at  uncovering  underlying  continuous  distributions.  In  many  applications, 
parameter  constraints  and  data  truncation  are  both  present. 

The  statistical  literature  on  such  problems  is  very  extensive,  reflecting  both  their 
widespread  occurrence  in  applications  and  the  methodological  challenges  which  they  pose. 
However,  it  is  striking  that  hardly  any  of  this  applied  and  theoretical  literature  involves  a 
parametric  Bayesian  perspective  (although  some  exceptions  will  be  noted  in  the  context  of 
the  examples  given  later,  in  Section  4).  From  a  technical  perspective,  this  is  perhaps  not 
difficult  to  understand.  The  fundamental  tool  for  Bayesian  calculations  in  typical  realistic 
models  is  (multi-dimensional)  numerical  integration.  This  is  often  problematic  in 
unconstrained  contexts;  it  can  be  well-nigh  impossible  for  the  kinds  of  constrained 
problems  we  shall  be  considering. 

Our  goal  in  this  paper  is  to  show  that  Bayesian  calculations  qan  be  routinely 
implemented  for  constrained  parameter  and  truncated  data  problems  by  means  of  the 
Gibbs  sampler  (introduced  by  Geman  and  Geman,  1984,  in  the  context  of  image  processing 
and  subsequently  proposed  as  a  general  method  for  Bayesian  calculations  by  Gelfand  and 
Smith,  1990). 

In  general,  we  shall  assume  that  the  desired  outcome  of  the  Bayesian  analysis  is  the 
calculation  and  display  of  marginal  posterior  (predictive)  densities  of  parameters 
(unobserved  data)  of  interest,  although  often  summaries  (for  example,  via  modes, 
moments,  quantiles)  will  suffice.  As  we  shall  see,  the  Gibbs  sampler  will  provide  the  basis 
for  whatever  form  of  final  inference  summary  we  require. 

In  Section  2,  we  briefly  review  the  Gibbs  sampler  and  comment  on  experience  with 
its  use  for  other  classes  of  statistical  problems.  In  Section  3,  we  present  a  general  overview 
formulation  of  the  structure  of  constrained  parameter  and  truncated  data  problems  and  the 
resulting  form  of  the  Gibbs  sampler.  In  Section  4,  we  develop  detailed  analyses  for  a 
variety  of  examples.  These  are  chosen  with  a  view  to  giving  the  reader  an  appreciation  of 
the  power  and  scope  of  the  Gibbs  sampler  in  reducing  seemingly  impossibly  complex 
romputati  >uai  iasko  to  siiupie,  easily  implemented,  iterative  sampling  schemes.  In  Section 
5,  we  provide  illustrative  analyses  of  two  artificial  data  sets,  generated  from  models  chosen 
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to  present  extremely  awkward  inference  problems.  Finally,  in  Section  6  we  provide  a 
summary  discussion. 


2.  The  Gibbs  Sampler 


Our  subsequent  development  will  use  the  following  notational  conventions. 
Densities  will  be  developed  generically  by  square  braclets,  so  that  joint,  conditional  and 
marginal  forms  for  random  variables  U,V,  appear,  respectively,  as  [U,V]  [U|V]  and  [V]. 
The  usual  marginalization  by  integration  is  denoted  by  forms  such  as  [U]  =  |  [U  |  V]  *  [V]. 
For  a  collection  of  random  variables  [Up  Uj,— ,UJ  the  full  conditional  densities  can  thus 

be  denoted  by  [U  |U  ,  r  t  s],  s  =  l,2,...,k,  and  the  marginal  densities  by  [UJ, 

D  I  8 

s  =  l,2,...,k. 

Suppose  we  now  consider  the  following  problem.  Given  the  ability  to  draw  random 

variate  samples  of  U  from  [U  |U  ,  r#sl  for  specified  values  of  U.  r^s,  s  =  l,2,...,k,  can 
B  8  *  x  r 

A 

we  find  an  iterative  scheme  which  enables  us  to  make  sample-based  estimates,  [Ug],  say, 

of  the  marginal  densities,  [U  ],  s  =  l,2,...,k?  We  shall  make  the  connection  with  Bayesian 

s 


posterior  calculations  later;  for  the  moment,  we  note  that  the  general  question  is  answerrd 
affirmatively  by  the  following  procedure. 

Gibbs  sampling  is  a  Markovian  updating  scheme  which  proceeds  as  follows.  Given 
an  arbitrary  starting  set  of  values  uj^,...,u£  ^  we  draw  uW  from  [UjID^),...^0)], 

then  U<*>  from  [U2|uW,  u|°> . u[0)]-"  and  so  on  up  to  if)  from 

[Ukiui1)...,^)]  to  complete  one  iteration  of  the  scheme.  After  t  such  iterations  we 

would  arrive  at  joint  a  sample  (U^,.-  >uj^).  Geman  and  Geman  show  under  mild 


conditions  that  3  (Up...,U^)  -  as  t  -♦  «.  Hence  for  t  large 

enough,  uj^  for  example  can  be  regarded  as  a  sample  variate  from  [U  ].  Parallel 

o  8 

replication  of  this  process  n  times  yields  m  iid  k-tuples  (U^,...,  u^)  )  =  1,2 . m. 

Note  that  sample  size  at,  say,  the  w-th  iteration  may  be  increased  from  m  to  any 
specified  size  by  rjmdomly  reusing  the  U  with  replacement. 


sj 


ft) 


A  kernel  density  estimate  for  [U  j  based  upon  t  .e  0';;  can  oe  readily  obtained 
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(see  e.g.,  Silverman,  1986)  and  should  be  adequate  if,  a  the  last  iteration  the  number  of 
replications,  m,  is  large  enough.  However,  using  a  Rac-  Blackwell  argument  (see  Gelfand 
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and  Smith,  1990)  a  density  estimate  of  the  form 


£ 

j=l 


(t) 

[Us|Urj  ,iM/m 


(1) 


is  better  under  a  wide  range  of  loss  functions.  This  is  not  surprising  since  (1)  takes 
advantage  of  the  known  structure  in  the  model  whereas  the  kernel  density  estimate  does 
not.  The  form  (1)  is  a  discrete  mixture  distribution  and  is  essentially  a  Monte  Carlo 
integration  to  accomplish  the  desi' d  marginalization.  Similarly,  the  expectation  E(h(Ug)) 

(t) 

can  be  obtained  either  as  a  sample  estimate  based  upon  the  U  ■  or  possibly  as  a 

sj 

"Rao— Blackwellized"  version  analogous  to  (1)  based  upon  E(h(Us)|Ur>  r  ^  s).  Now 

(t)  (t) 

consider  a  function  of  the  U-,  say  W(Uj,...,U^).  Each  k-tuple,  (Uj.  ,...,U^j  ), 

m  1  (0  (0 

provides  an  observed  W\  '  =  W(Ujj  ,...,U^j  )  whose  marginal  distribution,  as  t  -»  ®,  is 
approximately  [W],  whence  a  kernel  density  estimator  for  [W]  using  these  can  be 
developed.  If,  say,  U  actually  appears  as  an  argument  of  W  the  full  conditional  density 

O 

[W|Ur,  r  #  s]  can  be  obtained  by  univariate  transformation  from  [Ug|Ur,  r  #  s].  Thus  a 

"Rao-Blackwellized"  density  estimate  for  [W]  analogous  to  (1)  can  also  be  obtained. 

In  the  Bayesian  context  the  U-  are  the  unknown  parameters  (or  possibly 


unobserved  data)  in  the  model.  W  would  be  any  function  of  the  parameters  (or 
unobserved  data)  which  is  of  interest.  All  distributions  are  viewed  as  conditional  on  the 
observed  data,  whence  the  marginal  densities,  [U  ],  become  the  desired  marginal 

S 


posterior  distributions  of  the  parameters  (or  unobserved  data). 

So  far  as  ease  of  drawing  samples  from  the  full  conditional  distributions  is 
concerned,  in  many  cases  the  likelihood  and  prior  forms  specified  in  the  Bayesian  model 
lead  to  familiar  standard  full  conditional  forms,  such  as  normals  and  gammas,  and 
implementation  is  immediate.  In  other  cases,  we  simply  have,  up  to  proportionality,  a 
mathematical  form  'or  the  full  conditional  and  must  employ  tailored  versions  of  general 
random  variate  generating  procedures  such  as  the  ratio— uniforms  and  reaction  methods 
(see,  for  example,  Devroye,  1986,  or  Ripley,  1986). 

Finally,  we  note  that  complete  implementation  of  the  Gibbs  sampler  requires  that  a 
determination  of  t  be  made  and  that  across  iterations,  choice(s)  of  m  be  specified.  In  a 
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challenging  application  some  experimentation  with  different  settings  for  t  and  m  will 
likely  be  necessary.  We  do  not  view  this  as  a  deterrent  since  random  generation  is 
generally  inexpensive  and  since  in  many  cases  there  may  be  no  feasible  alternative.  In  the 
examples  of  Section  5,  convergence  was  evaluated  in  a  univariate  manner  by  plotting 
marginal  posterior  density  estimates  of  the  form  (1)  five  iterations  apart  in  order  to  judge 
stability.  Typically,  a  somewhat  small  m  is  used  until  convergence  is  concluded,  at  which 
point,  for  a  final  iteration,  m  is  increased  by  an  order  of  magnitude  to  develop  the 
presentM  density  estimate.  We  make  no  claims  for  the  optimality  of  this  procedure. 
Assessment  of  convergence  is  a  complex  issue  which  is  currently  very  much  in  the  empirical 
domain. 

However,  extensive  computational  experience  in  a  wide  variety  of  parametric  models 
has  given  us  considerable  confidence  in  the  pragmatic  procedure  described  above.  See,  for 
example:  Gelfand  and  Smith  (1989,  1990);  Gelfand  et  al  (1989);  Carlin  et  al  (1989); 
Racine-Poon  gt  id  (1990). 

Since  the  dted  papers  contain  extensive  discussion  of  very  detailed  specification  of 
the  Gibbs  sampler  in  a  number  of  situations,  we  avoid  unnecessary  detail  in  what  follows, 
concentrating  instead  on  the  structural  insights  which  underlie  the  specification  of  the 
required  full  conditionals.  Having  followed  the  general  discussion,  the  reader  can  easily 
supply  the  missing  detail  in  any  specific  example. 

2L  Models:  General  Structure 

In  this  section,  we  provide  a  discussion  of  the  Gibbs  sampler  structures  arising  from 
rather  general  formulations  of  Bayesian  parametric  versions  of  constrained  parameter  and 
truncated  data  problems.  As  indicated  in  Section  2,  the  implementation  problem  reduces 
to  identification  of  the  appropriate  full  conditional  distributions  and  methods  for  drawing 
samples  from  them. 

3.1.  Constrained  Parameter  Models 

Consider  a  parametric  model  for  data  Y  involving  a  k-dimensional  parameter 

k  *  k  k 

vector  6,  constrained  to  be  in  a  subset  Sy  of  R  .  Often  the  constraint  set  Sy  is 
determined  by  order  or  other  inequality  relationships  among  the  components  0-,  i  = 
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k 

l,2,...,k,  of  9 ,  in  which  case,  S  =  Sy  does  not  depend  on  Y.  In  other  cases, 

constraints  occur  because  the  region  of  positive  support  for  the  distribution  of  Y  depends 

k 

on  9,  so  that  Y  occurs  explicitly  in  Sy  :  see,  for  example,  Section  4.5,  where  Y  = 

(YrY2,...,Yk)  and  0.  <  Yj,  i  =  l,2,...,k.  In  the  former  case,  it  is  natural  to  think  of  the 

constraint  as  built  into  the  specification  of  the  prior  distribution,  [0|  A],  where  A  is  some 

hyperparameter;  in  the  latter  case,  it  is  natural  to  think  of  the  constraint  as  built  into  the 
likelihood,  [Y|  S\.  In  either  case,  it  suffices  to  note  that  the  constrained  Bayesian  model 

(likelihood  x  prior)  is  given  by 

(  Ey(f)  -  [f|  A] 

l  0 

k 

where  S  =  {(y,0):0eS  }.  In  general,  [Y|0],  [0|A],  as  functions  of  9  have  the  functional 

J 

forms  they  would  have  had  if  constraints  were  ignored.  It  follows  immediately 
(generalizing  slightly  a  remark  in  Box  and  Tiao,  1973,  p.  67)  that  the  posterior  distribution 
for  9,  given  the  constraints,  is  simply  the  unconstrained  posterior  appropriated 

normalized  so  that 


(y,  ?)  e  S 

(y,  0)  i  S 


[fiy]  = 


[yig]  •  m\ 

fS k  [Y|0]  •  [*|A] 

3y  -  - 


(2) 


k  k 

Now  let  Sj  (0j,j£i)  denote  the  cross-section  of  Sy  defined  by  the  constraints  on 

component  &  at  a  specified  set  of  values  9y  j  t  i  (where,  for  the  cross-section,  we  have 
for  notational  convenience  suppressed  possible  dependence  on  Y).  In  the  case  of  scalar 
k-  1 

components,  S-  (0j,  j  i  i)  is  a  subset  of  R  ,  typically  an  interval  or  a  collection  of 

intervals.  It  then  follows  immediately  from  (2)  that  the  full  posterior  conditional 
distribution  for  6 t  is  defined  by 
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(3) 


where  the  right-hand  side  is  regarded  as  a  function  of  0-  for  specified  0j,  #i.  When  for 

0j  the  likelihood  and  prior  combine  to  give  a  conjugate  Bayesian  form,  the  unconstrained 

version  of  the  full  conditional  for  0.  will  be  a  familiar  standard  distribution,  defined  by 

the  conjugate  prior  to  posterior  updating.  The  constrained  form  (3)  will  then  simply  be  the 

k 

standard  distribution  restricted  to  S.(0j,  j#). 

This  latter  point  is  critical.  Regardless  of  how  complicated  the  overall  constraint 
k  k 

set  Sy  is,  to  implement  the  Gibbs  sampler  we  need  only  consider  Sy  in  univariate 


cross-sections.  Moreover,  to  carry  out  the  actual  sampling  we  need  only  identify  the  full 
conditionals  under  the  unconstrained  model  and  then  make  the  restriction  to  the 
cross-sections. 

One  way  of  doing  this  is  simply  to  generate  from  the  unconstrained  full  conditional, 
and  retain  the  variate  value  only  if  it  falls  in  the  cross-section  constraint  region. 
Alternately,  suppose  the  form  of  the  distribution  function,  Fj,  say,  of  the  full  conditional 

for  0-  is  available  and  the  cross-section  is  an  interval,  [a,b],  say.  Then,  if  U  is  a  uniform 

-1 

(0,1)  variate,  it  is  noted  by  Devroye  (1986,  p.38)  that  0j  =  Fj  [Fj(a)  +  U^b)  -  F^a))] 


is  a  drawing  from  the  constrained  full  conditional.  Thus  we  sample  "one-for-one"  from 

the  constrained  full  conditional.  This  is  easily  extended  to  the  case  where  the  cross-section 

r 


is  a  collection  of  disjoint  intervals,  U  [a.,bj,  say.  In  this  case,  we  choose  J=j  with 

j=i 1  r 

probability  [  S(Fj(bp  -  F^))]'1  (F;(b.)  -  Fj(a.)]  and,  given  j,  set  0,  =  Fj  [Fj(ap  + 


U(Fj(bj)  —  Fj(aj))],  where  U  again  is  a  uniform  (0,1)  variate. 

In  general,  sampling  from  constrained  full  conditionals  will  not  be  particularly 
efficient,  especially  in  the  case  of  nonstandard,  unnormalized  distributions.  But  this  is 
more  than  compensated  for  by  the  striking  ease  of  implementation  of  the  Gibbs  sampler, 
enabling  one  to  carry  out  full  Bayesian  calculations  for  complex  constrained  parameter 
problems  which  were  previously  unanalyzable  by  standard  numerical  integration 
techniques. 

Finally,  we  note  a  further  feature  thal.arises  in  impleme  ring  the  Gibbs  sampler 
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were  we  to  seek  to  extend  the  above  to  a  hierarchical  model  by  adding  a  prior  [A]  for  the 
hyperparameter  A,  thus  far  assumed  to  be  known.  The  full  conditionals  for  the  0-  are 
unchanged  and  are  given  by  (3).  However,  the  full  conditional  for  A  doesn’t  depend  upon 
Y  and  takes  the  form 


[X\Yt0\  a  [0|  A]  [A]  c(A).  (4) 

where  c(A)  =  (J§[Y|  0][0|  A])- *.  If  0  is  not  constrained  by  Y,  c(A)  simplifies  to 
(^k[^!^])_1  but  regardless  c(A)  will  typically  not  be  available  explicitly  (see,  e.g. 

Sections  4.1,  4.2)  making  sampling  from  (4)  almost  impossible. 

3.2  Censored  Data  Models 

To  develop  a  general  framework  for  censored  data  models,  consider  random 
n— vectors  Y,V,W  with  joint  density  defined  by 

[y,y,w  |  e,rj]  =  [yiy,w,f]  •  [v,w|?] 


in  terms  of  parameters  0  and  rj,  and  define  component-wise  a  further  random  n-vector  Z 

by 


V. 

Y.  <  V. 

J 

J  -  J 

Yi  if 

VYj<wj 

•  w. 

Y.  >  W. 

J 

J  '  J 

j=l,2,...,n 


(5) 


We  shall  consider  Z  to  be  observed  data  arising  as  a  censored  form  of  Y  through  the 
censoring  process  [V,W|  77],  with  V  and  W  also  observed.  In  this  very  general 
formulation,  V  and  W  are  random,  but  the  process  could,  of  course,  be  degenerate  for 
either  or  both.  In  particular,  right  or  left  censoring  only  (corresponding  to  Wj  =  -®,  Vj  = 
+®,  respectively)  are  include  as  special  cases. _ 
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To  complete  the  Bayesian  model,  let  us  assume  that  prior  distributions  are  specified 
in  the  form  [0|  A]  [77]  [A],  so  that  the  full  model  becomes 


[Z|V,W,0]  ■  [V,W 1 17]  •  [«|A]  •  [r,]  ■  [A], 


(6) 


where  the  form  of  [Z|V,W,0]  is  defined  by  [Y | V,W, and  (5).  Other  forms  of  prior 

specification  could,  of  course,  be  considered,  but  the  form  given  here,  involving  a 
hyperparameter  A  in  the  construction  of  the  prior  for  8,  will  suffice  for  our  later 

illustrative  examples.  We  shail  assume  that  interest  focuses  on  the  marginal  posterior 
distributions  for  the  components  of  6,  [^jZ,V,W],  i  =  1,2,... ,k  as  well  as  perhaps 

[?|z,y,w], 

At  first  sight,  it  appears  natural  to  try  to  implement  the  Gibbs  sampler  using  the 
full  conditional  distributions  for  6^,  6^,. ..,8^,  tj  and  A.  We  note,  however,  that 

[*|Z,Y,W.*j,  j*i,  17,A]  «  (Z|V,W,f|  .  [0|A] 

with  the  right-hand  side  considered  as  a  function  of  ft  for  specified  ft,  # i.  This  leads  to 

difficulties,  since  the  density  [Z|V,W,0]  will  generally  be  awkward  to  deal  with.  Suppose, 

n 

for  example,  that  [Y|V,W,0]  =  [Y|  9\  =  0  f  (Y.|0).  Then  [Z.|V,W,0J  =  f.(Z-|0)  if 

. j=lJ  J  "  J  -  -  -  J  J  - 

V. 

V- <  Z-  <  W.,  but  has  point  masses  ft(V-,  9)  -  f  k(Z\0)dZ  at  Z-  =  V.  and 
J  J  J  J  J  — (j)  J  J  J 

X(W.,  8)  =  /  f.(Z|  ^)dZ  at  Z.  =  W..  Generally,  S-,  and  X  will  not  be  available  in 

J  J  -  '  J  J  J  J 

explicit  form  which  means  that  this  will  also  be  the  case  for  [Z|V,W,0]  whenever  any  Zj 
equals  either  Vj  or  Wj,  i.e.,  whenever  censoring  occurs. 

To  avoid  this  difficulty  suppose  instead  we  treat  Y  as  an  unobservable  and  include 
it  in  the  Gibbs  sampler.  The  model  (6)  now  becomes,  in  its  most  general  form, 


[Z !  y,v,w][y  I  y,w,0](y,w  |  r,M  a][?][a]  (7) 


Here  [Z|Y,V,W]  is,  of  course,  a  degenerate  distribution  and  in  typical  applications  we 
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shall  have  [Y|V,W,0]  =  [Y|  0\.  Note  that  now 

I Z.Y.y.W.tfj,  j#i,  v,X]  .  [y!Y-W.f]  [f|A]  (8) 

The  right  hand  side  of  (8)  is  now  an  explicit  function  of  9 j  and  sampling  no  longer 

presents  a  problem.  As  we  remarked  in  the  previous  section,  under  conjugacy,  sampling 
from  6 ■  will  simply  involve  sampling  from  a  standard  distribution.  Without  conjugacy 

we  will  need  to  sample  from  a  nonstandardized  density  using,  for  example, 
ratio-of-uniforms  or  rejection  techniques.  The  remaining  full  ccnditiona's  required  for  the 
Gibbs  sampler  are  given  by  [^|/i,Y,V,W,0,A]  a  [V,W|  t?][i;],  and  [A  |  Z,Y,y,W,^,^]  a 

[0|A][A]  to  which  similar  remarks  apply,  and  finally  [Y|  Z,V,W,0,77,A]  a  [Z|Y,V,W}  • 

[Y  |  V,W,0].  Again,  for  illustration,  consider  the  typical  case  where  [Y|V,W,0]  =  [Yj  0\  = 
n 

II  f.(Y.|0).  Then  the  Y.,  are  conditionally  independent  and  the  full  conditional 
j=l J  J  ‘  J 

distribution  for  Y-  has  the  following  form:  it  is  degenerate  at  Z.  if  V-  <  Z-  <  W;;  it 

J  J  J  J  w 

has  the  distribution  T(*  |  6)  restricted  to  (-®,  V^]  if  Zj  =  Vj,  and  has  the  distribution 

f  (-|0)  restricted  to  [W.,  ®)  if  Z-  =  W-.  Sampling  the  Y.  is  therefore  routine,  the 

latter  two  cases  being  handled  perhaps  by  the  "one— for-one"  methods  described  in  Section 
3.1. 

3.3  Grouped  Data  Models 

To  illustrate  scored  or  grouped  ordinal  data  models,  suppose  that  instead  of 
observing  the  actual  coordinates  of  a  random  n— vector  Y  we  only  observe  a  score, 

Sj  =  bf  if  at-1  <  Yj  <  at,  j  =  1,2,. ..,n,  t  =  1,2,... ,T,  where  the  at,  bt  are  known  fixed 

constants  (often  with  aQ  =  -m,  a^  =  +®).  Assuming  Y  to  have  a  parametric 

distribution  [Y|0]  and  9  to  have  a  prior  defined  by  [0|A],  [A],  the  Bayesian  model  is 

given  by 

[S|f|[?|A]  [A], 

where  [S  |  0]  is  induced  by  [Y  j  6\. 
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As  in  the  previous  section,  the  natural  Gibbs  sampler,  defined  directly  in  terms  of 
the  full  conditionals,  [A|S,0]  and 

«  |S|_*|[?|A], 

runs  into  trouble  due  to  the  presence  of  [S|0],  which,  in  general,  is  not  an  explicit 

expression  in  terms  of  The  solution,  again,  is  to  include  tbe  unknown  Y  as 

part  of  the  Gibbs  sampler.  The  Bayesian  model  then  becomes 

[S|Yj[Y|*][tf|A]  [A] 

and  the  fill  conditionals  are  git  en  by 

[A|S,Y,f]  .  [f | A]  JA] 

!*ji?,Y.«J,j#i,A]  «  [Y|_0][f|A] 

together  with  the  conditionals  for  Yj  given  Y-,  i#j,  derived  from 

(Y|S,f,A)  .  (S | YJ  [Y|  8\. 

n 

Sampling  is  now  straightforward.  In  particular,  if  [Y|0]  =  D  f.(Y|0)  and  S;  =’bt  the 

*  *  j=l  J  J  -  J  1 

full  conditional  for  Y-  is  simply  f •( •  |  ff)  restricted  to  [a.,  a.]. 

4.  Models:  Specific  Examples 

In  this  section,  we  make  explicit  the  forms  of  the  Gibbs  sampler  arising  from  a 
variety  of  examples  of  the  general  structures  discussed  in  Section  3.  Our  development  is 
designed  to  illuminate  for  the  reader  the  astonishing  simplicity  with  which  the 
appropriately  defined  Gibbs  sampler  solves  the  problem  of  Bayesian  computation  in 
constrained  parameter  and  truncated  data  contexts. 

As  we  remarked  in  Section  1,  there  is  remarkably  little  literature  on  Bayesian 
approaches  to  these  problems  and  that  which  exists  typically  does  not  solve  the  problem  of 
calculating  marginal  densities,  but  only  attempts  limited  inference  summaries  in  the  form 
of  modes  or  means.  Ordered  restricted  inference  is  discussed  a;  length  from  a  frequentist 
perspective  in  the  books  by  Barlow  et  al  (1972)  and  Robertson  et.  al.  (1988).  The  former 
has  some  discussion  of  Bayesian  inference  for  ordered  exponential  family  parameters,  but 
♦his  is  largely  limited  to  a  discussion  of  the  joint  posterior  mode  as  an  isotonic  regression. 
The  latter  provides  a  convenient  review  of  the  brief  Bayesian  literature  on  ordered 
parameters.  We  knew  of  no  systematic  discussion  cT  truncated  data  problems  from  a 
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Bayesian  perspective. 

4.1.  Ordered  Binomial  Parameters 

Suppose  that  we  have  conditionally  independent  observations  Y-  -  Binomial  (n.,pj), 

i  =  l,2,...,k,  where  it  is  known  that  p^  <  P2  <  •••<  Pk  and  we  seek  to  make  inferences 

about  the  p.  (or  functions  thereof,  such  as  p^+j  -  Pj  or  -  Pj)/Pj)»  This  problem 

is  discussed  in  Broffitt  (1987)  and  in  Sharpies  (1988).  It  arises  naturally  in  reliability 
development  testing,  where  interest  often  focuses  on  p^  (see,  for  example,  Smith,  1977, 

and  Fard  and  Dietrich,  1987)  and  in  bioassay,  where  p.  =  p(x^)  may  be  thought  of  in 

terms  of  a  monotonic  increasing  relationship  with  dose  or  stimulus.  The  function  p(x)  is 
referred  to  as  a  potency  curve.  See,  for  example,  Kuo,  1988,  for  recent  discussion  and 
related  references. 

A  flexible,  convenient  form  of  prior  distribution  over  the  simplex 
Sk  =  {(PpPj,- -,Pk) :  0  <  Pj  <  p2  <...<  pk  <  1}  takes  the  form 

k  a.— 1  0.-1 

ck(al,a2’"Mak  ’  ^i»^2’  .^i  Pj)  » 

where  c^  is  the  normalizing  constant  and  q-,^  are  chosen  to  reflect  prior  beliefs.  Note 

that  this  is  equivalent  to  unconstrained  p^  having  been  drawn  independently  from 

Beta(as,/7j)  distributions,  and  that  if  a-  =  a,  0^  =  0,  it  is  equivalent  to  the  p-  being 

order  statistics  from  a  sample  of  k  iid  Beta(a,/?)  variables.  For  integer  the 

constant  c^  can  be  obtained  as  a  finite  multidimensional  summation  (see  Sharpies,  1988). 

By  conjugacy,  the  joint  posterior  [p|Y]  has  the  same  form  as  (9),  but  with  the 

a-v0x  replaced  by  a>  +  Yj.  0i  +  m  -  Y.,  respectively.  (Again,  in  the  case  of  integer  a ^ 

the  exact  marginal  posterior  densities  for  the  p-  can  be  identified  as  very  complicated 

weighted  averages- of  Beta  distributions;  see  Sharpies,  1988.)  Implementation  of  the  Gibbs 
sampler  is  extremely  simple  From  (9),  we  see  immediately  that,  for  i  =  1,2,. ..,k, 
[PjlY,  pj,  j*i]  =  Beta(ai  +  ^  0i  +  m  -  Yj)  restricted  to  p^  <  Pj  <  Pi+1,  with 
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p0  =  0,  pk+i  =  1,  so  that  sampling  reduces  to  interval  restricted  sampling  from  a 

standard  distribution,  as  discussed  in  Section  3.1.  We  need  never  concern  ourselves  with 
calculation  of  c^. 

4.2  Ordered  Exponential  Family  Parameters 

For  conditionally  independent  observations  from  one-parameter  exponential  family 
models,  with  increasing  parameters  and  a  constrained  form  of  conjugate  prior,  the  analysis 
given  in  Section  4.1  generalizes  in  an  obvious  and  straightforward  way.  The  resulting  full 
conditionals  again  reduce  to  interval  restricted  sampling  from  the  standard  posterior  forms 
arising  from  the  conjugate  analysis. 

Detailed  developments  are  given  by  Broffit  (1984),  motivated  by  graduation 
problems  in  actuarial  science.  He  considers  ordered  parameters  from  a  family  of  models  of 
the  form 


((Y|fl  =  »(Y)«b(Y)e_fc<Y),  «>0 


(10) 


which  includes  models  such  as  gamma  with  known  shape  parameters  ,  normal  with  known 
mean  and  Poisson. 

Suppose,  then,  that  conditionally  independent  observations  y-j,  i  =  l,2,...,k, 
j  =  l,2,...,n-  are  available  from  f(*|0j),  where  it  is  assumed  that  6  e 
=  {e/=  :  0  <  Broffitt  suggests  a  convenient  and  flexible 

prior  family  for  6  over  of  the  form 


k  /fl  i 

»7fc)  n  j 

1=1  v  r(«.) 


(ii) 


where  d^  is  the  normalizing  constant  and  £.,  7^  are  chosen  to  reflect  prior  beliefs.  Note 
that  (11)  arises  from  independent  Gamma  priors  were  the  9-  unconstrained.  In  the  case 
where  the  are  integers  Broffitt  obtains  c^  as  a  finite  multidimensional  sum.  The  joint 
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posterior  [0|Y]  has  the  same  form  as  (11)  with  6 ■  replaced  by  6  =6-+  5)  b(Y. .)  and 

1  1  1  j=l  1J 

n . 

*  1  i  _ I 

7j  replaced  by  •  7-  =  (-+  E  c(Yjj))  The  P°steri°r  mean  for  0.  under  the 

*  ^  *  * 

unrestricted  problem  is  F-  =  7. .  Using  isotonic  regression  Broffitt  obtains  the  order 

* 

restricted  Bayes  estimate  for  0.  under  squared  error  loss  as  a  function  of  0-  and  two 
dk's. 

To  implement  the  Gibbs  sampler  we  only  require  the  full  conditional  distribution 
[0j|  Y,0j,  #i],  i  =  l,2,...,k.  Under  (11),  this  is  merely  a  Gamma  (0p  7.)  restricted  to  the 

interval  [0j_p  where  *0  5  *k+l  5  -  As  in  the  previous  example,  we  need  never 

concern  ourselves  with  calculation  of  the  normalizing  constant  d^. 

4.3  Ordered  Multinomial  Parameters 

Sedransk,  et  al  (1985)  discuss  the  problem  of  Bayes  estimation  of  finite  population 
parameters  when  a  random  variable  X  assumes  one  of  a  finite  set  of  values  {bp...,bj,} 

with  probabilities  ppP2,...,Pjp  respectively.  A  particular  application  is  the  case  of 

household  income,  where  bj  might  denote  a  central  value  for  the  income  category. 

Assuming  the  categories  to  be  arranged  in  increasing  order,  we  would  expect  that 
the  pj  would  increase  up  to  some  category,  t,  say  (1  <  t  <  k),  and  then  decrease 

thereafter.  Typically,  t  would  be  unknown.  The  quantity  of  primary  interest  in  such  a 

k 

situation  might  be  the  finite  population  mean,  E  b-p-,  although  other  functions  of  p 

j=l  rj 

could  also  be  of  interest.  A  possible  Bayesian  model  for  such  problems  is  given  by  defining 

k 

Y.  =  #  of  observations  in  category  j,  with  E  Y;  =  n,  so  that  [Y|  p]  = 
J  j=i 1 

Mult(n;  Pp  P2,...,PjI)-  Following  Sedransk  et  al,  given  t  we  specify  a  prior  [p|t]  the 
form 


c(0j, -A;  0  s  P  i 

_j=l  > 


Vl 


(12) 
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L  * 

over  S  =  {(Pp.-MPfc)  :  Pj  <  P2  <•••<  Pt  >  Pt+1  >  -Pfe.  0  <  Pj  <  1,  .S^pj  =  1}  where 

is  the  normalizing  constant.  Note  that  (12)  arises  from  a  Dirichlet  prior  over 
p  were  the  p-  unconstrained. 


Sedransk  et  al  assume  t  is  known  and  only  compute  desired  posterior  expections 
using  Monte  Carlo  itegration,  employing  importance  sampling  to  avoid  calculation  of  c. 
To  implement  the  Gibbs  sampler  requires  the  full  conditional  distribution,  for  p., 


l  *s  l,...,k— 1  (p^  is  a  function  of  these  p.),  [p.|  Y,  pj,  j  =  l,...,k— 1,  j#i,  tj.  This  is  clearly  a 

k— 1 

Beta  distribution  scaled  to  [0,  1—  E  pj  and  then  suitably  restricted  according  to  the 

j=i  r 

j*i 

constraints  determined  by  t.  Thus,  if  t  is  known  the  Gibbs  sampler  also  avoids 
calculation  of  c.  Moreover,  empirical  work  (as  in  Gelfand  and  Smith,  1990)  suggests  that 
iterative  Monte  Carlo  integration  using  the  Gibbs  sampler  will  be  more  efficient  than 
noniterative  Monte  Carlo  integration  such  as  that  used  by  Sedransk  et  al. 

Suppose  t  is  unknown  whence  we  take  it  to  be  random  with  discrete  prior 
Pr(t  =  j)  =  7j,  j  =  l,2,...,k.  We  note  that  [t|Y,p]  is  a  degenerate  distribution.  Therefore 

the  Gibbs  sampler  can  not  be  directly  employed  since  a  condition  for  its  convergence  is 
that  transitions  from  one  t  to  any  other  are  possible.  This  hierarchical  situation  differs 
from  that  in  expression  (4).  There  A  is  a  hyperparameter  having  nothing  to  do  with  the 
order  restrictions.  Here  t  determines  the  restrictions. 

Fort  :ely,  the  marginal  posterior  for  t  can  be  calculated  directly  taking  the  form 


Pi(t=j|Y)  = 


■  •■,£k+Yk ;  j) 

~~ E 

£  c(/J, . ^;t)r,/c(  0j+Yj . Bk+Yk;t) 


(13) 


Evaluation  of  the  2k  c’s  in  (13)  can  be  done  straightforwardly  using  Monte  Carlo 
integration  with  importance  sampling  as  in  Sedransk  et  al.  Thus  we  can  estimate  the 
marginal  posteriors  for  the  p-  by  using  the  relationship 

(P=  I Y]  =  S  [Pj|Y,t][t|Y]. 

1  ■  t=l 
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For  each  given  t  we  can  use  the  Gibbs  sampler  in  the  customary  manner  to  obtain 
samples  approximately  from  [p- 1  Y,t].  We  can  then  resample  from  these  samples  according 

to  [t|Y]  to  obtain  observations  approximately  from  [p- 1 Y].  Full  details  of  all  the 

required  sampling  in  the  context  of  an  illustrative  example  are  given  in  section  5.1. 

Note  that  in  a  different  context  the  sequence  p.  might,  for  instance,  be  assumed  to 

have  a  bimodal  form,  e.g.  for  grouped  data  arising  from  samples  of  exam  scores,  or  from 
samples  of  heights  or  of  weights.  It  is  clear  that  our  formulation  of  the  present  example 
can  be  extended  to  handle  such  cases.  Also  note  that  this  example  is  a  nonparametric 
version  of  the  grouped  ordinal  data  problem.  We  are  concerned  only  with  the  probabilities 
for  the  income  categories  and  not  with  an  underlying  parametric  model  for  the  incomes 
themselves. 

Finally,  extension  to  models  involving  collections  of  independent  multinomials  with 
order  restrictions  perhaps  both  across  and  within  populations  is  straightforward. 

4.4  Ordered  Linear  Model  Parameters 

We  demonstrate  the  potential  of  the  Gibbs  sampler  for  the  Bayesian  analysis  of 
constrained  parameters  in  general  normal  linear  models  by  considering  an  illustrative 
analysis  of  a  simple  two-way  layout.  Application  to  normal  means  without  linear 
structure  appears  in  Gelfand  et  al  (1989);  application  to  ordered  slopes  in  a  change-point 
regression  model  is  given  in  Carlin  et  al  (1989).  Extensions  to  other  problems  will  be 
obvious  from  the  following  development. 

Consider,  then,  a  model  of  the  form 


Y-j  —  otj  +  +  f-j,  i  —  1,2,...,I,  j  —  1,2,...,J, 


(14) 


2 

where  the  e-j  are  independent  N(0,a  ),  and  prior  knowledge  about  the  linear  parameters 

constrains  the  a.  to  be  decreasing  in  i  and  the  ^  to  be  increasing  up  to  some  unknown 

level  t  and  then  decreasing.  Such  a  model  generalizes  the  "response  surface"  priors 
discussed  in  Smith  (1973)  and  finds  application  in  many  contexts  where  factor  levels 
correspond  to  increasing  (decreasing)  levels  of  a  treatment,  fertilizer,  etc.  Other 
applications  occur  in  consumer  preference  studies  (Green  1974,  Green  k  Srinivasan,  1978): 
here,  Y-  might  to  a  scoring  or  ratin>:  of  a  product,  such  as  a  candy  bar,  with  factor  cr 
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corresponding  to  price  level  and  /?j  to  sugar  context  level. 

The  discussion  of  the  previous  sections  indicates  the  obvious  way  to  proceed.  We 
place  a  multivariate  normal  prior  on  the  set  of  or- ,/3j,  independent  of  the  e-j  ignoring  the 

order  restrictions.  To  complete  the  Bayesian  specification  we  place,  say,  an  inverse  gamma 
prior  on  a  and  a  discrete  distribution  on  t.  Simple  conjugate  analysis  (using,  for 
example,  the  algebraic  forms  given  in  Lindley  and  Smith,  1972)  straightforwardly  reveals 
the  full  conditionals  for  the  and  /3.  to  be  univariate  normals  suitably  constrained  (the 

constraints  for  /?j  being  dependent  on  t).  The  full  conditional  for  a  is  the  conjugately 


updated  inverse  gamma,  and  that  for  t  is  obtained  using  the  technique  described  in  the 
previous  section.  Full  detail  on  the  required  sampling  in  the  context  of  an  illustrative 
example  is  given  in  section  5.2. 


4.5  Ordered  and  Data  Constrained  Parameters 


k 

As  an  illustration  of  a  situation  where  the  constraint  set  Sy  discussed  in  Section 


3.1  depends  on  Y,  consider  the  following  model,  which  has  applications  to  reliability 


development  studies  and  survival  analysis.  We  suppose  that  Y-,  i  =  1,2,. ..,k,  j  =  l,2,...,nj 

are  conditionally  independent  observations  from  location  and  scale  exponential  models,  so 
that  Y.j  has  density 


f(Yul  Vi)  =  i<*P«V W  Y„  >  V  0,  a. 


>  0. 


In  the  absence  of  order  restriction  amongst  the  parameters  there  is  recent 
decision— theoretic  discussion  of  simultaneous  point  estimation  of  the  location  parameters  in 
such  models,  assuming  known  scale  parameters  and  vice  versa.  See  for  example  Ebrahimi 
and  Hosmane  (1988),  Das  Gupta  et  al  (1988). 

Here,  we  shall  complete  a  Bayesian  model  specification  by  assuming,  for  purposes  of 
i.ustration,  that  0  <  0^  <  0^  <••  <  0^  are  the  k  order  statistics  from  the  exponential 

nsity  X~ *exp{— 0/A},  with  X  known,  and  that  the  cr.  are  iid  from  IG(a,b),  the  inverse 

mma  density  [ba/r(a)|  [exp{— b  j  with  a,b  known.  We  are  interested  in 

raining  the  marginal  posterior  densities  for  the  0.  and  cr-  (or  functions  thereof),  a 

>blem  which  we  note  is  extremely  unpleasant  using  standard  Monte  Carlo  integration 


18 


due  to  the  awkward  nature  of  Sy,  defined  by  Y-  >  0-x  >  0  and  0^  <  0^  <...<  0^. 

However,  approached  via  the  Gibbs  sampler,  the  analysis  is  very  straightforward. 
In  particular,  consider  the  full  conditional  distributions  for  the  9-  and  a j.  The  cr-  are 

conditionally  independent  with 

[<7(1  Y,  S]  =  IG  (a  +  ni/2,  b  +  !>;(?(  -  «()) 

-  -I*1 

where  Y  =  n.  S  Y. ..  For  6-  we  have 
j=l  1J 

—$■  (i  —  — ) 

IAJY,?,  «j,j#i]  «  «  ^ 

restricted  to  the  interval  <  &  <  (min  Yij)  A  0j+i>  where  These  full 

conditional  distribution  are  therefore  easily  sampled  and  the  Bayesian  analysis 
straightforwardly  implemented. 


fine  Regression  With  Censorint 


As  a  first  illustration  of  a  truncated  data  problem,  consider  the  special  case  of  the 
structure  introduced  in  Section  3.2  where  [Y|  0j  corresponds  to  conditionally  independent 

2 

straight-line  observations  generated  from  =  a  +  /SX-  +  e-  -  N(0,a  ),  i  =  l,2,...,k, 
j  =  1,2,... and  Z  is  defined  by 


z..  = 


YU  $  di 


Yij  >  ^ 


Thus,  at  each  setting  Xj  of  the  regression  variable  there  is  a  cut-off  d-,  above  which  the 

value  of  Yy  cannot  be  observed.  An  application  of  this  model  is  given  in  Schmee  and 

Hahn  (1979)  and  a  Bayesian  analysis  using  adaptive  Gauss— Hermite  quadrature  is  given  in 
Naylor  and  Smith  (1982),  where  various  subtleties  required  in  performing  the  numerical 
integration  are  noted. 

In  contrast,  the  implementation  of  the  Gibbs  sampler  using  the  approach  set  out  in 
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Section  3.2  is  straightforward.  We  include  the  unobserved  Y-  (i.e.,  those  where 

Yjj  >  d.)  as  further  unknowns  in  the  model.  Then,  given  conjugate  normal  priors  for  a,  0 

2  2 
and  inverse  gamma  prior  for  &  ,  it  is  clear  that  the  full  conditionals  for  a,  0  and  a  are 

the  updated  conjugate  forms  obtained  by  standard  Bayesian  analysis  assuming  all  the 

Y^to  be  observed.  The  full  conditional  for  any  unobserved  Y-j  is  simply  N(a  +  /EC,  a  ), 

restricted  to  the  range  Y-  >  dj.  The  required  sampling  from  all  full  conditionals  is 

therefore  immediate. 


4.7  Truncated  Bivariate  Normal  Data 

Consider  a  bivariate  normal  process  where  X-  is  observed  and  then  Y-  is  observed 

only  if  Yj  <  X.,  i  =  l,...,n.  Such  data  arises,  for  example,  in  paired  survival  time  studies 

(perhaps  using  logarithms  of  survival  time)  where  observation  is  terminated  when  the  first 
patient  dies.  The  more  general  version  where  observation  is  terminated  when  either 
patient  dies  can  be  handled  in  a  similar  fashion  to  the  development  given  below. 

More  precisely,  assume  iid  pairs  (X-,Y.)  such  that  for  i  =  1,2,. ..,n 


x,  «  &i  a 

K„)l  =  «((„)•(  2 

Y.  6  a  a 

l  y  xy  y 


» 


(15) 


We  observe  the  pairs  (X-,  Z.)  with  Z-  =  Y.  if  Y.  <  X-;  otherwise  we  observe  (Xj,  *) 
where  =  *  indicates  that  Y-  >  Xj.  Inference  for  $x,  0^  is  of  interest  as  well  as, 

0X 

perhaps,  for  E,  the  covariance  matrix  in  (14).  A  convenient  prior  form  for  (  )  is 

1  I 

N((  ),V)  and  for  E  an  inverse  Wishart,  so  that  [E”  ]  =  W((pR)  l,p)  with  n  ,  a  ,  V,  p 

and  R  assumed  known.  Interest  focuses  on  the  marginal  posterior  distributins  of  0  ,  6 

x  y 

and  E.  Following  the  development  of  Section  3.2,  these  may  be  routinely  obtained  using 
the  Gibbs  sampler.  As  in  section  3.2,  treating  Y  as  an  unobservable,  we  will  need  the  full 
conditional  distributions  of  $x,  6 '  ,  E  and  Yj. 
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Xi  $x  nx 

Letting  T.  =  (  ),  0  =  (  ),  (i  =  (  )  and  using  standard  distribution  theory  it  is 

Y-  0  ~  u 

1  y  *y 

straightforward  to  verify  that: 


[f|Ti,i=l,...,n)Z,E]=N(V(n~1S+Vr1T+n_1S(n_1E+V)^(nlT1+V_1r1) 


-1  , 


rn  ^  _-l 


where  T  =  n  E  T-  and  &om  (16)  the  full  conditional  distributions  for  6  and  0 
i=i  -1  x  y 


are  routinely  obtained, 


P-1|Tj>  Z,«]  =  W((S(Trf)(Ti-«)T  +  /Kf1,  n+p) 


from  which  sample  generation  is  easily  achieved  as  detailed  in  Gelfand  et  si  (1989). 
Routine  analysis  also  shows  that  the  Y.  are  conditionally  independent  with 

b  5 = N«v + yxi -y/4  4  - 

if  Zj  =  *,  and  Yj  degenerate  at  otherwise. 


There  are  a  number  of  obvious  extensions  and  generalizations  of  this  kind  of 

structure.  Examples  include:  further  modeling  of  0  and  0  as  parametric  functions  of 

x  y 

exploratory  variables;  trivariate  or  multivariate  data  and  bivariate  exponential  family 
models  with  conjugate  priors.  The  reader  can  doubtless  think  of  others.  The  point  we  wish 
to  stress  is  that  systematic  use  of  the  ideas  of  Section  3.2  will  lead  in  each  case  to  a 
relatively  straightforward  form  of  Gibbs  sampler,  no  matter  how  seemingly  complex  the 
initial  model  structure  is. 


4.8  Bivariate  Grouped  Data 

Suppose  that  data  from  an  underlying  continuous  bivariate  distribution  has  been 
grouped  into  an  IxJ  table,  and  we  wish  to  make  inferences  about  the  parameters  of  an 
assumed  bivariate  parametric  form  for  the  unobserved  continuous  data. 

In  what  follows,  we  shall  assume,  for  illustration,  an  underlying  bivariate  normal 
population  of  the  form  (15)  given  above  in  Section  4.7.  For  convenience  of  nomenclature, 
we  shall  refer  to  the  two  component  variables  at  'height"  and  "weight",  with  height  groups 
[ai _ i>  a|]»  i  =  and  weight  groups  [b^  p  bj],  j  =  1,...,J  where  aQ  =  bQ  =  0 
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(technically  these  should  be  -®),  ai  =  bj  =  ®.  The  data  consist  of  counts  n-j  i  = 

j  =  where  n^  denotes  the  observed  number  of  individuals  in  height  group  [a- _ a-] 

and  in  weight  group  [b-  ,,  b-].  If  E  E  n..  =  n,  the  n..  are  distributed  as 

J-  J  j  j  J  J 

Mult(n;  pu,  p12,...,PIJ)  where  p^  =  ^(0,53)  =  <  X  <  aj,  bj_j  <  Y  <  bj)  under 

(15).  For  illustration,  we  adopt  the  same  prior  structure  for  0  and  53  as  given  in  the 
previous  section,  i.e.  [0]  =  N(/i.V)  and  [53-1]  =  W((pR)-1,p),  and  we  seek  the  marginal 

posterior  distributions  [0x|n],  [0  |n]  and  [53|n]  where  n  —  (nu»  ni2»-*nij)- 

7  *  X 

The  Gibbs  sampler  is  most  easily  implemented  if  we  include  the  T  =  (v  ), 

-s  ig 


s  =  l,...,n  in  the  model  as  unobservables.  We  then  require  the  full  conditional 
distributions  for  0^,  0y,  53  and  T  =  (Tp...,Tn).  But  [0(T,n,53]  is  exactly  (16)  from  which 

the  full  conditional  distributions  for  0  and  0  can  be  readily  obtained.  Similarly, 

^  7 

[53|T,n,0]  is  exactly  (17).  Finally,  we  need  to  generate  T  ,  s  =  l,...,n  given  n,0  and  E. 

—  —  -  5  *  * 


But  this  merely  requires,  for  each  pair  i,j,  that  we  generate  n.j  independent  observations 
from  (15)  restricted  to  [aj_p  a.]  x  [bj_p  bj].  Each  such  generation  can  be  implemented  by 
drawing  X  from  N(0x,  ax)  restricted  to  [aj_p  a-J  and  then  Y  given  X=x  from 
N(0y+axy(x-0x)/4  <7y  -  0*y/<£)  restricted  to  [b^,  b^J. 

We  note  the  obvious  extension  to  higher  dimensional  tables  arising  from  an 
underlying  multivariate  normal  model.  Another  interesting  extension  arises  if  we  have  a 
collection  of  independent  two  way  tables  arising  from  a  third  classification  variable,  i.e., 
product  multinomial  sampling  (see  Bishop,  Fienberg  and  Holland,  1975).  To  be  concrete, 
suppose  this  third  variable  is  age  and  the  bivariate  groups  do  actually  correspond  to  height 
and  weigh:.  That  is,  grouped  height  and  weight  data  is  supplied  (using  the  same  groups) 
for  a  sample  of  say  5  year  old  children,  a  sample  of  6  year  old  children,  etc..  Under  (15)  it 
seems  reasonable  that  both  0  and  0  should  increase  with  age.  Thus  we  have  both 

*  y 

grouped  c  ;a  and  ordered  parameters  within  one  model.  We  leave  details  of  this  extension 
to  the  ren  er,  who  by  now  will  not  be  surprised  to  find  that  the  Gibbs  sampler  is  very 
straightfc  ard,  despite  the  seeming  awkwardness  of  the  model  and  parameter  constraints. 
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Illustrative  Analysis 

In  this  section,  we  analyse  two  artificial  data  sets,  derived  from  models  based  on 
those  discussed  in  Sections  4.3  and  4.4.  It  will  be  clear  from  Section  4  that  real 
applications  of  these  and  the  other  models  discussed  exist  in  abundance.  Our  purpose  in 
analyzing  artificial  data  sets  generated  from  known  models  is  to  provide  insight  into  and 
calibration  of  the  performance  of  the  methodology  we  have  presented. 

5.1  Multinomial  With  Ordered  Parameters 

As  an  example  of  the  problem  discussed  in  Section  4.3,  Table  1  shows  a  k  =  8  cell 
multinomial  model  and  the  results  of  40  random  draws  from  this  model 


Pi 

.03 

.07 

.10 

.25 

.30 

.12 

00 

o 

.05 

Yi 

1 

4 

1 

12 

13 

4 

4 

1 

Table  1:  Multinomial  Population  and  Generated  Data 

We  assume  that  the  p-  increase  i  =  1,2,. ..,t  and  then  decrease  thereafter  but  otherwise 
Pj  and  t  are  unknown.  We  take  the  generalized  uniform  Dirichlet,  a.  =  1,  i  =  1,2,... ,8 
and  calculate  the  constants  c(l,...,l;t),  c(Y1+l,...,Yg+l;t),  t  =  1,2,... ,8,  as  described  in 
Sedransk  et  al  (1985).  Using  (13)  we  obtain  the  marginal  posterior,  [t|  Y],  which  is  shown 
in  Table  2.  Note  that  despite  only  40  draws  from  an  8  cell  table  and  a  flat  prior  [t  |  Y] 
places  nearly  all  its  mass  on  t  =  4  and  5. 


t 

1 

2 

3 

4 

5 

6 

7 

8 

P(»  1 Y) 

.0000 

.0001 

.0013 

.3527 

.6350 

.0104 

.0005 

.0000 

Table  2:  Marginal  Posterior  Distribution  of  t 

As  remarked  in  Section  4.3,  to  obtain  the  marginal  posteriors,  [Pj|Y],  we 

implement  the  Gibbs  sampler  in  a  slightly  different  way.  We  use  general  k  and 
a  =  (ap...,a^)  in  the  ensuing  details.  Since 
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tPj  I Y]  =  E  [Pj  |  Y,t]  [t  t  Y] 
1  -  1=1  i 


(18) 


and  since  [t  J  Y]  has  already  been  obtained  we  propose  to  sample  from  [p- 1 Y]  by 

randomly  selecting  t  according  to  [t|Y]  and  then  sampling  p-  from  [p.  |  Y,t].  The 

densities  [p.  |  Y,t]  can  be  obtained  using  the  Gibbs  sampler  in  the  customary  fashion,  as 

indicated  in  Section  4.3.  More  precisely,  we  only  require  the  full  conditional  distributions 

k— 1 

for  pj,  i=l,...,k— 1  since  pk  =  1  =  .  E  p..  But  [pjjY.t.Pj,  j=l,2,...,k-l,  #i]  =  Beta(a  + 


k— 1 


and  then  restricted 


Y-,  a,  +  Y, )  scaled  to  the  interval  [0,  a  ]  where  a-  =  1  —  E  p.,#i, 
IKK  11  j=l  ^ 

to  an  interval  determined  by  the  other  Pj’s  and  t,  i.e.,  with  pQ  =  0 

if  i  <  t,  max  (pi_1,  ai~pk_1  <  p{  <  min  (pi+1>  af); 

if  i  >  t,  max  (pi+p  aj-p^j)  <  Pj  <  min  (pw,  ^); 

it  i  =  t,  max  (pt_1,  pt+1,  <  Pt  <  at- 


The  output  from  m  replications  of  the  Gibbs  sampler  will  be  vectors  pj  = 

j  =  l,2,...,m,  such  that  the  pj  are  approximately  distributed  as  [p|Y,t]  and  thus  the 

Pjj,  j  =  are  approximately  distributed  as  [pjly.t]. 

Suppose  we  run  the  Gibbs  sampler  in  this  manner  for  each  t,  t  =  l,...,k.  Then  in 
theory  we  could  obtain  a  kernel  density  estimate  for  each  [p.  |  Y,t],  t  =  l,...,k  and  thus  via 

(18)  a  density  estimate  which  is  a  finite  mixture  of  these.  In  practice  we  would  merely 
randomly  select  t  according  to  [t  |  Y]  and  then  make  an  equally  likely  choice  from  the  set 
of  pjj,  j  =  l,...,m.  This  resampling  procedure  results  in  an  observation  approximately 

distributed  as  [p-|Y|.  Repeating  this  process  a  large  number  of  times  (1000  times  to 

create  the  plots  in  table  2)  we  can  compute  a  kernel  density  estimate  for  [p-|Yj. 

Returning  to  the  analysis  at  the  beginning  of  this  section,  in  Figure  2  we  plot  such 
kernel  density  estimates  for  the  illustrative  set  Pj,  p^.  p^,  p^.  We  note  that  these 

posteriors  reflect  the  order  restrictions  and  have  modes  close  to  the  respective  true  values. 
The  complete  set  of  posterior  modes  is  given  in  Table  3. 
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i 

1 

2 

3 

4 

!  5 

6 

7 

8 

mode  of 
[PiM 

.019 

.061 

.082 

.246 

.289 

.096 

.063 

.019 

Table  3:  Marginal  Posterior  Modes  of  the  [p^  |  Y] 


5.2  Two  Wav  Layout  With  Ordered  Parameters 

Consider  the  problem  discussed  in  Section  4.4  Table  4  presents  a  set  of  data,  Y, 

generated  from  (14)  of  Section  4.4  with  =  2,  =  1,  Og  =  0,  Oj  =  — 2,  0^  =  —  1,  ^  = 

o 

0,  /?2  =  2,  =  —1,  ft.  =  -2,  and  a  =  3.  Thus  for  each  column  cell  expectations  decrease 

while  for  each  row  they  increase  to  the  middle  column  and  then  decrease.  The  data  is 
rather  noisy  often  at  odds  with  these  expectations.  Ordinary  least  squares  analysis 
ignoring  known  order  restrictions  is  terribly  misleading;  =  1.064,  =  — 1.163,  = 

.536,  q4  =  —5.203,  0X  =  -1.737,  02  =  .758,  0Z  =  .283,  0A  =  -3.344,  05  =  -1.917,  a2  = 

3.590.  The  analyst  obtains  estimates  for  the  *i  and  0-  which  Lil  to  meet  the  restrictions 

and  are  often  far  from  the  true  values.  Some  sort  of  constrained  least  squares  solution  (an 
isotonic  regression)  would  be  a  better  freq-entist  approach. 


J 

1 

2 

3 

4 

5 

1 

.982 

1.902 

3.797 

-1.531 

.570 

2 

-1.417 

1.356 

1.287 

-3.629 

-3.413 

3 

-1.601 

4.713 

.814 

.834 

-2.08J 

4 

—4.912 

-4.541 

-4.768 

-9.051 

-2.744 

Table  4:  Generated  Two  Way  Layout  Data 

Bayesian  analysis  using  the  Gibbs  sampler  is  easily  implemented  in  this  case 

2 

yielding  marginal  posterior  distributions  for  the  a-,  the  and  a  .  In  the  process,  using 

say  posterior  modes,  the  isotonic  regression  problem  is  sol  1 

Specific  details  are  as  follows.  Suppose  for  simpl;  y  we  assume  conjugate  normal 

2 

and  inverse  gamma  forms  for  the  and  for  a  ^spectively.  That  is,  ignoring 
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2  2 

restrictions  let  a-  i.i.d.  N(0,a^),  0^  i.i.d.  N(0,a^).  (For  convenience  we  have  centered 

these  priors  at  0  and  have  chosen  the  above  a-,  0^  to  be  approximately  centered  at  0  as 

well).  Let  <r  -IG(a,b)  independent  of  the  a-  and  /L.  We  make  there  priors  rather 
2  2 

vague  by  setting  —  5,  =  5,  a  =  0,  b  =  1.  The  full  conditional  distributions  using 

2  2 

general  o a,  o^,  a,  h  and  incorporating  the  order  restrictions  are: 


o  5^(Y.  -/?•)  cr*cr^ 

k|  Y,  a  I#;,  /?.,  /]  =  N( . a.'J-g-,  a,  -,  i=l,2,.1,4 

4  J  K/r  4- 


5(r“  +  a  5£r>a 
a  a 


restricted  to  (o._p  «j+1)  where  aQ  =  a,.  =  +«,  Y-  =  E  ^ij/5  2111(1  ^j-5- 


0  4^(Y-.-a-)  o4<rz 

Wily,  /?  ,  «4j,  a.,  a2]  =  N(  ■  - ,  -f— j),  j— 1,2 . 5 

J  4<7^+  a  4<r^+<r 


restricted  to  Vi>  j  =  1,2,  (^+1,  j  =  4,5,  [min(^2,  ^4),®),  j  =  3  where 


/?n  =  =  +®»  Y  ■  = 


•j  =  “d  -  =  jI^i/4) 


(<r2|Y,  aj,  i=l,..,4,  j=l . 5]  =  IG(»+10,  b+SS(Yjrai-Wj)2/2). 


As  output  from  running  the  Gibbs  sampler  Figures  2,  3,  ?nd  4  show  the  marginal 

2 

posterior  distributions  for  the  a^,  the  0-  and  a  respectively.  Figures  2  and  3  show  that 

these  marginal  posteriors  respond  to  the  order  restrictions.  In  particular  marginal  posterior 
modes  are  o^=  1.480,  02=  .197,02=  —507, a^=  —3.684,/^=  — 1.039, /?2=  SZ5,0^= 

*  *  9* 

1.261,/?4=  —1.149,^2=  -1.790,(7  =  3.975.  These  are  in  accord  with  the  restrictions  and 

generally  much  cTosf  to  the  true  values  than  the  ordinary  least  squares  estimates.  While  a 
constrained  least  t  ,uares  solution  would  no  doubt  produce  comparably  good  pcint 
estimates,  because  t  e  Gibbs  sampler  enables  marginal  posterior  distributions  for  the  a^, 
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0.  for  e.g.  a+0-  for  e.g.  a  —a  ,  0  — 0  ,  etc.  we  can  in  addition  easily  obtain  Bayesian 
J  i  j  r  s  r  s 

interval  estimates  for  any  functions  of  the  model  parameters  which  may  be  of  interest. 

In  the  above  we  have  assumed  that  0 ^  was  known  to  be  the  largest  0.  If  we  did 

not  know  which  subscript  denoted  the  largest  0  we  could  use  an  approach  analogous  to 
that  described  in  the  previous  example.  If  we  felt  the  data  contained  some  outliers,  we 
might  robustify  the  Bayesian  analysis  by  assuming  that  the  distribution  of  the  errors  in 
(14)  is  [e-  |  A  J  =  N(0,  A^  <r2)  where  v\.  .-Gamma(i//2,  1/2)  so  that  marginally  the 

{.j-t^(0,<r  ).  To  implement  the  Gibbs  sampler  we  would  include  the  Ay  as  unobserved 

variables.  All  full  conditional  distributions  are  straightforward  to  obtain.  We  omit 
details. 

6^  Summary 

Our  intent  has  been  to  describe  how  Bayesian  analysis  of  a  broad  range  of  ordered 
parameter  and  truncated  data  problems  can  be  straightforwardly  implemented  using  the 
Gibbs  sampler.  This  approach  avoids  well-nigh  impossible  numerical  integrations  over 
high  dimensional  sets  defined  by  complex  restrictions.  Rather,  it  only  requires  sampling 
from  univariate  full  conditional  distributions  restricted  to  easily  described  subsets  of  R*. 
With  conjugacy  the  needed  full  conditional  distributions  will  be  standard  probability 
distributions;  without  conjugacy  we  will  have  to  employ  tailored  versions  of  general 
random  variate  generation  procedures.  While  sampling  may  be  inefficient  this  is  more  than 
compensated  for  by  the  ability  to  carry  out  full  Bayesian  calculations  for  many  problems 
which  were  previously  inaccessible.  Two  illustrative  examples  show  how  much  stronger 
inference  can  be  when  restrictions  are  taken  into  account  in  the  modeling  process. 
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